Skip to content

UtxoIndex: Key UTXOs by DAA score, adds pagnated and cursor-based api, exposes this via get_utxos_by_addresses_v2 RPC#991

Open
D-Stacks wants to merge 18 commits into
kaspanet:masterfrom
D-Stacks:index_utxoindex_by_daa_score
Open

UtxoIndex: Key UTXOs by DAA score, adds pagnated and cursor-based api, exposes this via get_utxos_by_addresses_v2 RPC#991
D-Stacks wants to merge 18 commits into
kaspanet:masterfrom
D-Stacks:index_utxoindex_by_daa_score

Conversation

@D-Stacks
Copy link
Copy Markdown
Collaborator

@D-Stacks D-Stacks commented May 6, 2026

Summary

  • Reorganizes the UTXO index storage to key entries by DAA score for efficient DAA-range scans.
  • Adds utxoindex db migration logic on start-up.
  • Adds a paginated UTXO query over a set of script public keys and an inclusive DAA score range.
  • Pagination uses a cursor (start_address, start_daa_score) plus an optional soft limit, returning next_address/next_daa_score for continuation.
  • pagination allows for efficient polling based utxo-retrival solutions, where, for example, a wallet only needs to update over a "newer" daa score range.
  • Exposes the new behavior through get_utxos_by_addresses_v2 and wires it through RPC/gRPC/wasm.
  • Also included into kaspa-cli as a call.

Behavior

  • Req: from_daa_score/to_daa_score define an inclusive range (defaults: None -> 0 and None -> u64::MAX).
  • Req: start_address/start_daa_score resume from a specific point inside the range (defaults: None -> first address in addresses and None -> from_daa_score i.e. functions as an entry point to use the call).
  • Resp: next_address/next_daa_score indicate the next cursor; None means no more pages.
  • Req: limit is a soft cap on page size; a page may exceed it to finish the current SPK+DAA group (default: None -> u64::MAX).

Testing

  • cargo test -p kaspa-utxoindex test_page_

Backwards Compatibility

  • old get_utxos_by_addresses is still fully functional. Only the order of retrieved utxos changes. But i do not expect clients to rely on this for anything.
  • new call keeps the list of addresses input for easier migration.

D-Stacks and others added 18 commits April 30, 2026 21:51
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
…e daa score.

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
@demisrael
Copy link
Copy Markdown
Contributor

demisrael commented May 9, 2026

Thanks -- the structural change is clean. DAA score moves into the per-entry key, the bucket-prefixed range scan is built on set_iterate_lower_bound / set_iterate_upper_bound, the soft-limit-finishes-group state machine traces correctly across the cross-script boundary, and the upper bound at numeric edges (including to_daa_score == u64::MAX) is sharp. cargo fmt --check, cargo clippy -p kaspa-utxoindex --all-targets --no-deps, and cargo test -p kaspa-utxoindex --lib test_page_ all pass on HEAD.

Three independent peer reviews ran (upstream review + algo peer review on the state machine + architect peer review on design surfaces); the issues consolidate into three MEDIUM concerns to fix before merge and a handful of LOW polish items.

Algo correctness affirmations (independent re-derivation)

Before the issue list -- the algo peer review independently re-derived the four invariants the algorithm needs (group cohesion, next-cursor points to first NOT-included, next-cursor None iff exhausted, no-double-emit on round-trip) and traced each against the implementation. No correctness blocker found at the algorithm level. The affirmations worth surfacing for reviewer confidence:

  • Group cohesion. The inner loop drains the current (script, daa) group fully before stopping (indexed_utxos.rs:329-348), so a single-page response can exceed limit to finish the group. The unbounded single-group case is L6 below (worth documenting on the RPC limit field).
  • Cross-script cursor carry-over. The current_group: Option<(usize, u64)> state at :297 persists across the outer-loop boundary -- this is what makes stop_after_group=true set on the last entry of script A correctly fire on the first entry of script B. The next_script_public_key / next_daa_score always point to the first NOT-included (script, daa) group, never a pushed entry.
  • Cursor stability under address-list mutation. The cursor is a full ScriptPublicKey (not an index), so it is robust to common between-page mutations: re-ordering the address list, adding addresses before or after the cursor, or removing addresses other than the cursor address. The only failure mode is removing the cursor address itself -- position(==) returns None, start_index = script_public_keys.len(), the outer loop is skipped entirely, and the response is an empty page with next_* both None. This is the same silent-empty-equals-exhausted collision M3 documents but reached via a legitimate-client-behaviour path; the RPC validation at service.rs:752-757 rejects this case for external clients, so the failure is contained at the RPC boundary today. Internal callers (mocks, downstream forks, future internal use) inherit the silent ambiguity -- L2 tracks the store-layer contract gap.
  • DAA-range bounds at numeric edges. seek_to = ((to+1)_be || OP_zeros) (indexed_utxos.rs:318-319) is sharp at every numeric edge: u64::MAX (the (to_daa_score < u64::MAX).then(...) guard correctly falls back to the bucket-prefix upper), u64::MAX - 1 (entry at exactly (u64::MAX, OP_zeros) correctly excluded), and 0 (entry at exactly (1, OP_zeros) correctly excluded). Verified against rocksdb 0.24's set_iterate_range + set_iterate_upper_bound override semantics directly (db_options.rs:3920-3998); the upper-bound override preserves the lower bound, and upper bound is exclusive.
  • Sort vs. DB byte order divergence is algorithmically inert (not just currently inert). The pagination walks each bucket independently via one seek_iterator call per bucket -- there is no cross-bucket merge, so the cross-bucket walk order is purely a Rust-side decision and the divergence with DB lex order at ScriptPublicKeyVersion >= 256 does not threaten any invariant. Confirms the documentation-only classification (L7).

MEDIUM (recommend fix before merge)

M1 -- migration nukes the DB dir but leaves the WAL subdir populated. When --rocksdb-wal-dir is configured (args.rs:427-433), conn_builder.rs:139-155 creates <wal_base>/<utxoindex_db_dir.file_name()> and passes it to opts.set_wal_dir(...). The migration block at daemon.rs:619-624 only deletes utxoindex_db_dir, so old v0 WAL files survive. Reopening at the same db_path with the same wal_dir leaves RocksDB recovery free to scan and replay them against the freshly-created v1 schema -- CompactUtxoEntry shrunk from (amount, block_daa_score, is_coinbase) to (amount, is_coinbase) (indexed_utxos.rs:45-49), so any replay is bincode-corrupt. The remove_dir_all-without-WAL pattern exists at daemon.rs:354 and :515 too, but those run on operator-explicit triggers; this PR makes the same shape the default upgrade path for every --utxoindex operator. Suggest adding a sibling fs::remove_dir_all(wal_base.join(utxoindex_db_dir.file_name().unwrap())) between the drop and the recreate.

M2 -- has_legacy_db_version falls back to tips_exist || supply_exists when the version marker is missing (store_manager.rs:43-67). That heuristic only holds because ensure_db_version_current is the very last write of a successful resync (after insert_circulating_supply and set_tips at index.rs:212-217) -- an invariant nowhere codified. Any future refactor that swaps the order will silently turn "v0 with marker missing" into "fresh DB indistinguishable from v0", flipping the migration branch the wrong way. Cleanest fix: write the version marker as the first write of a fresh resync (so a marker-present DB is well-defined and the tips/supply heuristic can be retired). Alternative: tighten the comment and pin the invariant at the resync write-order site so a future refactor cannot quietly invert it.

M3 -- get_utxos_by_addresses_v2 silently returns an empty page on store-layer error -- indistinguishable from genuine end-of-stream. The shim at service.rs:279-301 drops the store-layer StoreError via .unwrap_or_default() and produces OrderedUtxoSetByScriptPublicKeyPage { entries: [], next_*: None }. The RPC handler then returns GetUtxosByAddressesV2Response { entries: [], next_address: None, next_daa_score: None } -- bit-for-bit identical to a successful exhausted-cursor response. A polling client cannot distinguish "stream exhausted" from "internal error, data silently dropped". The legacy get_utxos_by_addresses RPC has the same shape, but the v2 RPC's cursor contract makes the silent-empty-equals-exhausted collision worse: the contract explicitly promises a cursor-driven stream, and the silent-empty page violates it. Compounds with M1: a corrupt entry from WAL replay would either panic kaspad (via the res.unwrap() at indexed_utxos.rs:324) or surface as silent data loss to the client. Suggest changing the shim signature to propagate the error and translating to RpcError at the handler boundary.

LOW (polish; not blockers)

L1 -- migration uses unwrap() on filesystem operations. daemon.rs:622-624 -- a graceful operator-facing message via .expect("...") is cheap, and the migration is now the common upgrade path.

L2 -- store-level start_address not in addresses returns silently empty. indexed_utxos.rs:285-292 sets start_index = script_public_keys.len(), the loop body never runs, an empty page is returned. The RPC handler rejects this case explicitly at service.rs:752-757 so external clients are safe -- but the store-layer contract is implicit. Sister cases: store accepts start_daa_score without start_script_public_key (and silently applies it to the first sorted address) and limit = Some(0) (silently behaves as limit = Some(1) modulo group cohesion). Either return StoreError or doc-comment all three preconditions.

L3 -- end-of-stream / limit-exactly-equals-total tests missing. The three new tests cover ordering+range, soft-limit-finishes-group (mid-stream), and resume-from-cursor. Not covered: limit exactly equals total entry count -> next_* should both be None. The state-machine trace shows the natural-end branch handles this correctly, but a regression here would NOT be caught by any existing test. Specific recipe: populate N entries, call with limit = Some(N), assert page.next_script_public_key.is_none() && page.next_daa_score.is_none(). Also worth: empty addresses set, from_daa_score == to_daa_score single-DAA range, the five RPC validation branches (one #[test] each).

L4 -- no test exercises the v0 -> v1 migration end-to-end. Construct a v0-shaped utxoindex DB, run migration, assert post-migration reads match. Highest test gap given M1/M2.

L5 -- migration prompt does not warn about resync duration. daemon.rs:620 -- the resync iterates the entire virtual UTXO set in chunks of 2048 and takes many minutes on a fully-synced mainnet node. Operators who hit the prompt and answer y and then see no progress for several minutes are likely to Ctrl+C -- landing back at M2 (interrupted resync, supply_exists=true, prompt-again-on-next-boot). One sentence ("this rebuild iterates the full UTXO set and may take several minutes on a fully-synced node") would prevent the panic-Ctrl+C -> corrupt-state loop.

L6 -- single-group page size is unbounded. The soft-limit semantic drains the current (script, daa) group fully -- a 100k-UTXO group at one DAA score will return all 100k entries in one call regardless of limit. This is a deliberate design choice (no resume-mid-group cursor; group cohesion is invariant), but the RPC limit field's doc-comment does not say so. Document on get_utxos_from_script_public_keys_by_daa_score_page and on the RPC GetUtxosByAddressesV2Request::limit field.

L7 -- address sort vs. DB byte order divergence is documentation-only. indexed_utxos.rs:279 sorts by (version_numeric, script_bytes); the DB key prefix is (version_le_bytes, script_len_le, script_bytes). They coincide for all currently-used ScriptPublicKey versions (= 0); they diverge for versions >= 256. Pagination correctness is unaffected (each script is iterated independently and the cross-bucket walk order is purely a Rust-side decision -- algorithmically inert per the algo peer review's §6). A one-line comment that pagination order is independent of DB byte order would close the gap.

L8 -- CLI handler is positional and dense. cli/src/modules/rpc.rs:316-358 -- 5 mandatory positional args after the address list. Easy to misorder; misordering surfaces as a parse error, not a usage error. Flag-based parsing (--from, --to, --start-addr, --start-daa, --limit) would be sturdier.

L9 -- two unrelated trailing-whitespace cleanups bundled in. utxo_set.rs:26 and api/mod.rs:54. Optional split into a hygiene PR.

Process / surface notes

P1 -- PR description under-represents the actual change surface. The body's "Backwards Compatibility" claim ("old get_utxos_by_addresses is still fully functional. Only the order of retrieved utxos changes") is true at the wire level but not at the Rust crate level. The body does not mention:

  • database/src/access.rs:217-251 -- seek_iterator gained a seek_to: Option<TKey> parameter. Foundational DB-layer API change; consensus's utxo_set.rs:156 was patched to pass None.
  • New public types in kaspa-index-core: UtxoEntryKeyData, OrderedUtxoCollection, OrderedUtxoSetByScriptPublicKey, OrderedUtxoSetByScriptPublicKeyPage. CompactUtxoCollection's key type changed from TransactionOutpoint to UtxoEntryKeyData -- a compile-break for any downstream crate that imports it and iterates .keys().
  • New trait method on UtxoIndexApi (get_utxos_by_script_public_keys_by_daa_score_page) -- public trait extension; downstream UtxoIndexApi impls (test mocks etc.) need updating.
  • New registry prefix DatabaseStorePrefixes::UtxoIndexDbVersion = 195.
  • RPC_API_REVISION bump 0 -> 1. Sent over the wire by service.rs:1318-1320 on every GetServerInfo, but the only client-side reader at processor.rs:513-516 interpolates it into an error-message string only AFTER the version gate has failed -- bump is correct per convention but not gating client behaviour.

Suggest extending the PR description with a "Wider surface" section so reviewers don't miss them and downstream consumers know what to update on bump.

P2 -- architect-level design questions worth considering for a follow-up (none blocking this PR):

  • The codebase already has a versioned-DB upgrade pattern in MultiConsensusManagementStore with a 'db_upgrade: while loop and per-version handlers. The new utxoindex versioning is a binary "matches-or-not" check -- this locks the index into wipe-and-rebuild forever. Worth generalizing to a VersionedIndexStore trait before the next index store copies this shape.
  • The migration lifecycle at daemon.rs:608-636 duplicates the ConnBuilder chain (28 lines of two-site state) precisely where the WAL-cleanup-gap class lives (M1). An UtxoIndex::open_or_migrate(builder) API that owns the close/rmrf/reopen + WAL-subdir cleanup as a single transaction would structurally prevent that class.
  • The new seek_to: Option<TKey> parameter on access.rs:seek_iterator carries an unstated within-bucket precondition (the upper-bound override silently widens past the prefix range if the caller passes a key that exceeds it). Current pagination caller honours it; a future cross-bucket caller could silently violate prefix isolation. Either tighten the doc-comment or split into seek_iterator_within_bucket / seek_iterator_range.

Static evidence on HEAD

cwd: kaspanet/rusty-kaspa @ 7372a8156b73f11eca05435370ca4c9fb12fa266
cargo fmt --check -p kaspa-utxoindex -p kaspa-rpc-core -p kaspa-rpc-service -p kaspa-grpc-core -p kaspa-database  -- clean
cargo clippy -p kaspa-utxoindex -p kaspa-rpc-core -p kaspa-rpc-service -p kaspa-grpc-core -p kaspa-database --all-targets --no-deps  -- clean
cargo test  -p kaspa-utxoindex --lib test_page_  -- 3 passed (test_page_ordered_and_range_filtered, test_page_soft_limit_finishes_group, test_page_resumes_from_start_daa_score)

State machine traced by hand through test_page_soft_limit_finishes_group (cross-script current_group carry-over, group cohesion, next_* semantics); see the "Algo correctness affirmations" section above for the independent re-derivation summary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants